Large Deviations of the Maximum Eigenvalue for Wishart and Gaussian Random Matrices 
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We present a simple Coulomb gas method to calculate analytically the probability of rare events where the 
maximum eigenvalue of a random matrix is much larger than its typical value. The large deviation function that 
characterizes this probability is computed explicitly for Wishart and Gaussian ensembles. The method is quite 
general and applies to other related problems, e.g. the joint large deviation function for large fluctuations of top 
eigenvalues. Our results are relevant to widely employed data compression techniques, namely the principal 
components analysis. Analytical predictions are verified by extensive numerical simulations. 
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Rare events where one of the eigenvalues of a random ma- 
trix is much larger than the others play an important role in 
data compression techniques such as the "Principal Compo- 
nents Analysis" (PCA). PCA is a very useful method to detect 
hidden patterns or correlations in complex, high-dimensional 
datasets. A non-exhaustive list of applications includes image 
processing [1, 2, 3], biological microarrays [4, 5], population 
genetics [6, 7], finance [8, 9], meteorology and oceanogra- 
phy [ 1 0] . The main idea behind PCA is very simple. Consider 
a rectangular (M x N) matrix X whose entries Xij represent 
some data. For instance, Xij might represent examination 
marks of the i-th student (1 < i < M) in the j-th subject 
(physics, mathematics, chemistry, etc., with 1 < j < N). The 
product (N x N) symmetric matrix W = X T X represents 
the covariance matrix of the data and it contains information 
about correlations. In PCA, one first identifies eigenvalues 
and eigenvectors of W. The data are maximally scattered and 
correlated along the eigenvector ("principal component") as- 
sociated with the largest eigenvalue A max - The scatter pro- 
gressively reduces as lower and lower eigenvalues are con- 
sidered. The subsequent step is the reduction of data dimen- 
sionality, achieved by setting to zero those components corre- 
sponding to low eigenvalues. The rationale is that retaining 
the largest components will preserve the major patterns in the 
data and only minor variations are filtered out. 

The above description of PCA makes it clear that the effi- 
ciency of the method crucially depends upon the gap between 
the top eigenvalues and the "sea" of intermediate and small 
eigenvalues. In particular, the further is the maximum eigen- 
value A max spaced from all the others, the more effective the 
projection and the compression procedure will be. A ques- 
tion then naturally arises: how good is PCA for random data? 
This issue has a twofold interest. First, in many situations, the 
data are high-dimensional and have random components. Sec- 
ond, random ensembles provide null models needed to gauge 
the statistical significance of results obtained for non-random 
datasets. To address the question just formulated, one needs to 
compute the probability of rare events where the largest eigen- 
value A m ax has atypically large fluctuations. The purpose of 
this Letter is to provide a simple physical method, based on 
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FIG. 1: The dashed line shows schematically the Marcenko-Pastur 
average density of states for Wishart matrices with the aspect-ratio 
parameter c = N/M = 1 and the full line is the distribution of the 
maximum eigenvalue A ma x. The PDF is centered around the mean 
value (Amax) = 4iV and typically fluctuates over a scale of width 
N 1 / 3 . The probability of fluctuations on this scale is described by 
the known Tracy-Widom distribution (green curve). The red (blue) 
line on the right (left) describes the right (left) large deviation tail of 
the PDF, which is the object of interest in this paper. 



the Coloumb gas method in statistical physics, that allows us 
to compute analytically the probability of these rare events for 
a general class of random matrices. 

Let us start by considering Wishart matrices [11], which 
are directly related to PCA and multivariate statistics [12]. 
Wishart matrices are defined via the product W = X T X 
of a (M x N) random matrix X having its elements 
drawn independently from a Gaussian distribution, P[X] oc 
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The Dyson indices (3 = 1,2 correspond 
respectively to real and complex X [13]. Without any loss of 
generality, we will assume hereafter that M > N. In addition 
to the aforementioned PCA applications, Wishart matrices ap- 
pear in antenna selection in communication technology [14], 
nuclear physics [15], quantum chromodynamics [16], statis- 
tical physics of directed polymers in random media [17] and 
nonintersecting Brownian motions [18]. 
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The spectral properties of W — X T X are well-known. For 
example, N eigenvalues {Ai}'s of W are nonnegative random 
variables with a joint probability density function (PDF) [ 1 9] 

N 

i=i j<fc 

where a = (1 + M — N) —2/(3. This can be written as 
P [{A;}] oc exp [-0E ({A;}) /2], with the energy 
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coinciding with that of a 2-d Coulomb gas of charges with 
coordinates {A^}. Charges are confined to the positive half- 
line in the presence of an external linear+logarithmic poten- 
tial. The external potential tends to push the charges towards 
the origin, whilst the Coulomb repulsion tends to spread them 
apart. A glance at (2) indicates that these two competing 
mechanisms balance for values of A scaling as ~ N. Indeed, 
from the joint PDF (1), one can calculate the average density 
of eigenvalues, p N {\) = ± J2iLi( S ( X ~ A «)> ~ ]?f (&)> 
with the Marcenko-Pastur (MP) [20] scaling function : 
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Here, c = N/M (with c < 1) and the upper and lower edges 
are b = (c^ 1 / 2 + l) 2 and a = (c.- 1 / 2 - l) 2 . For all c < 1, 
the average density vanishes at both edges of the MP sea. For 
the special case c = 1, we have a = 0, b = 4 and the average 
density j(x) = ^\/(4 — x)/x for < x < 4, diverges as 
x~ x / 2 at the lower edge (see Fig. 1). 

The MP expression shows that the maximum eigenvalue 
Amax has the average value (A max ) sw bN for large N. Typ- 
ical fluctuations of A max around its mean are known to be 
of 0(7VV3) [12j 17] More specifically, A max = bN + 
c !/6 j,2/3 ^ where \ has an A-independent limiting 
PDF, .9/3 (x), the well-known Tracy-Widom (TW) density [21]. 
The TW distribution for ft = 1, 2 has asymmetric tails [21] 
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as x ->■ -oo , (4) 
as x — > oo. (5) 



In contrast, the probability of atypically large, e.g. ~ O(N), 
fluctuations of A max from its mean bN are not captured by the 
TW distribution. Note that these configurations are precisely 
those relevant here, as they are ideally suited for the PCA to 
work accurately. 

How does the PDF P(A max ,A) look like for |A max — 
bN\ > OiN 1 / 3 ) where the TW form is no longer valid? Us- 
ing general large deviation principles, Johansson [17] proved 
that for large fluctuations ~ O(N) from its mean, the PDF 



-P(A max = t, N) has the form (for large N) : 
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where $±(x) are the right (left) rate functions for the large 
positive (negative) fluctuations of A max . The challenge is 
to explicitly compute their functional forms. The approach 
developed for Gaussian matrices [22] allows to compute the 
left function $_(x) [23] but it does not apply to the right 
tail. The problem of computing the right function $+(x') is 
solved hereafter. This is followed by the application of the 
new method to Gaussian matrices and further generalizations. 

The starting point of our method to compute is 
the energy expression (2). The Coulomb gas physics sug- 
gests that when the rightmost charge is moved to the right, 
Amax — bN ~ O(N), the MP sea is a priori not subject 
to forces capable of macroscopic rearrangements. Following 
this physical picture, the right rate function is determined by 
the energy cost in pulling the rightmost charge in the exter- 
nal potential of the Coulomb gas and the interaction of the 
charge with the unperturbed MP sea. This energy cost for 
Amax = t 3> bN can be estimated for large N using Eq. (2) 

AE(t) =t-a ln(i) - 2N J In \t - A| p N {\) dX , (7) 

where pjv(A) is the MP average density of charges and the 
integral describes the Coulomb interaction of the rightmost 
charge with the MP sea. This energy cost expression is valid 
up to an additive constant, which is chosen such that AE(t = 
bN) = since our reference configuration is the one where 
Amax = bN. For large N, we scale t = zN, use the MP 
expression (3) and the energy cost finally takes the form 
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ln(z) -2 / ln(z-z')f(z')dz', (8) 
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valid for z > b and up to an additive constant. The probabil- 
ity of such a configuration is P(z, N) oc exp [—(3AE(z)/2]. 
Making a shift of variable z = b + x, it follows that P(t, N) 
for large N and for t—bN ~ O(N) agrees with the large devi- 
ation behavior in Eq. (6). Progress is that we also have derived 
the explicit expression of the right rate function < I ) +(x) 
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where x > and the additive constant was chosen to have 
<I>+(0) = 0. The integral can be performed exactly and ex- 
pressed as a hypergeometric function. For c = 1 (a = and 
b = 4), we obtain 



$+(x) = _ i n ( x + 4 ) + G ( 
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FIG. 2: Numerical results (circles) for the maximum eigenvalue dis- 
tribution - In P(A max = t, TV) vs the scaled variable (t - 47V) /TV. 
Here, TV — 10, Wishart matrices are real (/3 = 1) and c = 1. Com- 
parisons are with the Tracy- Widom distribution (red line) and the ex- 
act right (green line) and left (blue line) large deviation predictions. 



where G(z) =3 F2 [{1, 1, 3/2}, {2, 3}, z] is a hypergeometric 
function (with a lengthy but explicit expression in terms of 
elementary functions). For the sake of comparison, we also 
report the simpler expression of the left rate function [23]: 
= \n(2/V4~~^) - x/8 - x 2 /64forx > 0. 
The asymptotics of <f>+ (x) can be easily worked out from 
Eq. (9). For large x, &+(x) ~ x/2 independently of c, while 
the function has a nonanalytic behavior for small x : 
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This shows that, as A r] 
side, the PDF of A m 
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- bN < 0(7V) from the right 
= t in Eq. (6) behaves as 
~a/36) (t/N - 6) 3/2 ] . Expressing the expo- 
nent in terms of the TW variable \ = c~ 1/6 6~ 2 / 3 7V~ 1/3 (7 - 
67V), we recover exactly the right tail behavior of the TW den- 
sity in Eq. (5). Thus, the large deviation function $>+(x) 
matches, for small arguments x, the behavior of the TW den- 
sity at large arguments. This is quite consistent with the 
fact that the scales of the fluctuations for TW and 3>+(x) are 
0{N 1 /' i ) and O(N), respectively. In fact, our method pro- 
vides, as a bonus, a physical derivation of the right tail behav- 
ior of the TW density (originally derived through the Painleve 
differential equation satisfied by the TW distribution [21]). 

We confirmed theoretical predictions by extensive numer- 
ical simulations. About 10 11 realizations of real {(3 = 1) 
Wishart matrices of sizes TV = 10, 26, 50, 100 and with dif- 
ferent values of c < 1 were efficiently generated using the 
tridiagonal results in [24]. We find very good agreement with 
our analytical predictions for the right large deviations. For 



example, in Fig. (2) we present the results for c = 1 and 
TV = 10. MonteCarlo numerical results are compared to the 
TW density (obtained by numerically integrating the Painleve 
equation satisfied by the TW distribution [21]) and $+(x) in 
Eq. (10), multiplied by TV. For comparison, we also show 
the corresponding left rate function $_(— x) [23] multiplied 
by TV 2 . It is clear that, while the numerical data are well de- 
scribed by the TW density near the peak of the distribution, 
they deviate considerably from TW as one moves into the 
tails, where our large deviation predictions work perfectly. 

Our Coulomb gas method is quite general and it can be ap- 
plied to other related problems. For example, we can compute 
the right large deviation function of A max for Gaussian ran- 
dom matrices. For the latter, the eigenvalues can be positive 
or negative with joint PDF [25] : 
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where the Dyson indices = 1, 2 and 4 correspond to the 
orthogonal, unitary and symplectic ensembles. The quadratic 
nature of the potential in (12), in contrast to the linear term 
appearing in (1), makes that the amplitude of a typical eigen- 
value scales as ~ s/N. The average density of states for large 

TV has the scaling form, /9jv(A) ps -^p/sc ( "^y )' wnere the 



famous Wigner semi-circular law f sc {x) = \/2 — x 2 /ir has 
compact support over [—y/2, y/2]. Thus, (A max ) = s/2N and 
typical fluctuations of A max around its mean are known [21] to 
be TW distributed over a scale of ~ 0(N~ 1 ^ 6 ). Specifically, 

- 1 / 6 v with n o = 1 UP), n , = 9~ 7 /6 



A ra ax = V2TV + ap 7V~ i/b X , with 01,2 = 1/V2, a 4 = 2~ 
and x is a random variable with the TW distribution gp{x)- 
Again, the TW form describes the PDF P(A max = t, TV) only 
in the vicinity oft = V27V over a small scale of ~ 0(7V -1 / 6 ), 
while deviations from the TW form appear in the tails. 

Fluctuations of A max over a scale ~ 0(y/~N) are described 
by large deviation functions, analogous to the Wishart case in 
Eq. (6) but with a different scaling variable 



P(t,7V) - exp 
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As previously mentioned, the left rate function \&_ (x) was re- 
cently computed exactly in Ref. [22], but the right rate func- 
tion v I / +(x) was yet unknown. Our Coulomb gas approach 
allows to solve this problem as well and gives for *f?+(x) : 



*+(.*) = 



1 -HzV2) + ±g(^ 



Az 2 



(13) 



Here, z = A max /TV = x + s/2, the hypergeometric function 
G was defined earlier and the additive constant was chosen to 
have ^+(0) = 0. The asymptotics of ^+(x) can be easily 
derived: for large x, ^+(x) ~ x 2 /2, while the non-analytic 
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x=N~ 1 ' 2 [t-(2N) 1 ' 2 ] 

FIG. 3: Numerical results for the maximum eigenvalue distribution 
(circles) for N = 10 real (J3 = 1) Gaussian matrices, compared with 
the Tracy- Widom result (red line) and the exact right (green line) and 
left (blue line) large deviation functions. 

behavior ^ + {x) w 2 7 / 4 a; 3 / 2 /3 holds for small x. Using the 
TW scaling variable \ = (\ max - V^ptj N 1/6 /ap, with a, 3 
defined after (12), it is easy to check that one recovers the 
correct TW right tails for all j3 = 1, 2 and 4. This provides 
again a physical derivation of the TW right tail. 

We have realized simulations for Gaussian matrices with 
sizes TV = 10, 25 and 50 and for j3 = 1 and 2. In Fig. (3) we 
present the data for the PDF of A max (with N = 10, (3 = 1) 
and compare with the TW form and the exact left function 
ty- [22] and right rate function *B+(x) derived in Eq. (13). 
As in the Wishart case, the TW form works well near the peak 
t = y/2N, but it fails as we move into the tails, where the 
large deviation predictions are quite accurate. 

Our Coulomb gas method lends to further generalizations 
that we only briefly mention here. For instance, we can com- 
pute the joint probability distribution for large fluctuations of 
n top eigenvalues in Wishart and Gaussian random matrices. 
If n -C N, the energy will be given by their mutual charge 
interactions, the external potentials and their interaction with 
the unperturbed MP sea. Integrals are the same as those com- 
puted previously and yield the explicit expression for the joint 
PDF. It is also possible to use our method to compute the large 
deviation function for fluctuations of the smallest eigenvalue 
Amin for Wishart matrices with c < 1. Note that the MP sea 
remains unperturbed (and our method applies) for small fluc- 
tuations of A m i n while the method in [22] applies for large 
fluctuations of A m ; n , which compress the MP sea. 

In conclusion, we have presented a new Coulomb gas 
method to compute large deviation probabilities of top eigen- 
values for a general class of random matrices. The physi- 



cal picture that emerges is quite transparent: when the top 
eigenvalues are pulled to the right (towards large values) the 
Marcenko-Pastur (or Wigner) sea is simply pinched and the 
top eigenvalues do not drag all the other eigenvalues. In other 
words, no macroscopic rearrangement of the sea occurs and 
the top eigenvalues move in the effective potential defined by 
the external potential of the Coulomb gas and by the elec- 
trostatic potential generated by the charges in the sea. Our 
predictions are formally valid for large N yet our simulations 
indicate that they work for moderate N as well. This further 
adds to the relevance of the large deviation rate functions de- 
rived here to data compression methods and their applications. 
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